Dear Sir or Madam, May I introduce the YAFC Corpus: Corpus, Benchmarks and Metrics for Formality Style Transfer
نویسندگان
چکیده
Style transfer is the task of automatically transforming a piece of text in one particular style into another. A major barrier to progress in this field has been a lack of training and evaluation datasets, as well as benchmarks and automatic metrics. In this work, we create the largest corpus for a particular stylistic transfer (formality) and show that techniques from the machine translation community can serve as strong baselines for future work. We also discuss challenges of using automatic metrics.
منابع مشابه
SQUINKY! A Corpus of Sentence-level Formality, Informativeness, and Implicature
We introduce a corpus of 7,032 sentences rated by human annotators for formality, informativeness, and implicature on a 1-7 scale. The corpus was annotated using Amazon Mechanical Turk.1 Reliability in the obtained judgments was examined by comparing mean ratings across two MTurk experiments, and correlation with pilot annotations (on sentence formality) conducted in a more controlled setting. ...
متن کاملA Naïve Bayes classifier for Shakespeare's second-person pronoun
In order to investigate in explicit detail the way that yand thpronouns alternate in the Shakespearean corpus, I have undertaken a collocational analysis of the full corpus of Shakespeare’s 37 plays and found that (1) second-person pronouns can be disambiguated based on context alone, (2) ypronouns seem to be used in more formal situations or when an inferior is addressing a social better, and ...
متن کاملI-29: Luteal Phase Support in Frozen-Thawed Embryo Transfer Cycle
Cumulative pregnancy rate has been significantly increased since frozen-thawed embryo transfer was applied in ART cycles. This method has become an essential part of IVF/ICSI treatment. Luteal phase support has been proven to be associated with higher rate of live birth rate. Human chorionic gonadotropin (HCG), and progestrone have been successfully used for luteal phase support in ovarian stim...
متن کاملA Corpus-based Analysis of Collocational Errors in the Iranian EFL Learners' Oral Production
Collocations are one of the areas generally considered problematic for EFL learners. Iranian learners of English like other EFL learners face various problems in producing oral collocations. An analysis of learners' spoken interlanguage both indicates the scope of the problem and the necessity to spend more time and energy by learners on mastering collocations. The present study specifically f...
متن کاملFiber Tractography and Diffusion Tensor Imaging in Children with Agenesis and Dysgenesis of Corpus Callosum: A Clinico-Radiological Correlation
Background Corpus callosum is the largest commissure in human brain. It consists of tightly packed white matter tracts connecting the two cerebral hemispheres. In this study we aimed to evaluate role of fiber tractography (FT), and diffusion tensor imaging (DTI) in ped...
متن کامل